Excel BI - Excel Challenge 928

excel-challenges

excel-formulas

🔰 Extract status labels and trailing hour values from text, then summarize hours by status with a total row.

Published

March 23, 2026

Illustration for Excel BI - Excel Challenge 928

Challenge Description

🔰 Extract the status and the time given in hours from each text row, summarize the total hours by status, and append a final total row. The supplied implementations solve this by extracting the status label and the trailing numeric hours value from each string.

Solutions

library(tidyverse)
library(readxl)

path <- "900-999/928/928 Summarize Status Hours.xlsx"
input <- read_excel(path, range = "A2:A27")
test <- read_excel(path, range = "C2:D7")

result <- input %>%
  mutate(
    Status = str_extract(Data, "Completed|In-Progress|On-Hold|Pending"),
    Hours = str_extract(Data, "\\d{1,3}(?:\\.\\d+)?(?=\\D*$)") |> as.numeric()
  ) %>%
  summarise(`Total Hours` = sum(Hours), .by = Status) %>%
  arrange(desc(`Total Hours`)) %>%
  bind_rows(
    summarise(., Status = "Total", `Total Hours` = sum(`Total Hours`))
  )

all.equal(result, test)
# [1] TRUE

Logic: Extract one known status label and the final numeric value from each string, aggregate hours by status, then append a grand total.
Strengths: The solution is compact and does not need brittle delimiter-based parsing.
Areas for Improvement: The prompt mentions several suffix variants for hours, but the implemented logic is best understood as extracting the trailing numeric hours value.
Gem: Two stable regex anchors are enough to turn semi-structured prose into a summary table.

import pandas as pd

path = "900-999/928/928 Summarize Status Hours.xlsx"
input = pd.read_excel(path, usecols="A", skiprows=1, nrows=26)
test = pd.read_excel(path, usecols="C:D", skiprows=1, nrows=5)

result = (
    input
    .assign(
        Status=lambda d: d["Data"].str.extract(r'(Completed|In-Progress|On-Hold|Pending)'),
        Hours=lambda d: d["Data"].str.extract(r'(\d{1,3}(?:\.\d+)?)(?=\D*$)').astype(float)
    )
    .groupby("Status", as_index=False)["Hours"]
    .sum()
)
result = pd.concat(
    [result, pd.DataFrame({"Status": ["Total"], "Hours": [result["Hours"].sum()]})],
    ignore_index=True
)
result = result.rename(columns={"Hours": "Total Hours"})

print(result.equals(test))
# True

The Python version mirrors the same idea with str.extract(): one regex for the status and one regex for the final numeric chunk. Once those two fields are isolated, the rest is just a grouped sum plus a one-row total.

Difficulty Level

Easy / Medium

The aggregation is simple. The only real challenge is spotting the right text anchors to extract reliable fields from messy strings.